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VOICE RECORDING ELECTRONIC SCHEDULER 
FIELD OF THE INVENTION 

The present invention relates generally to 
digital voice recording devices coupled with a programmable 
5 daily scheduler that possess alarm reminder options. 

Functional operation of the device is accomplished by means 
of either an external switch on a keypad or through spoken 
voice commands. The device also has the feature of being 
able to present information both on a visual display or in 
10 audio (voice synthesis) . By introducing an interactive 
dialogue between device and user, memory and voice 
recognition demands are appreciably reduced, important to 
compact design and portability. 

BACKGROUND OF THE INVENTION 

15 The prior art is comprised of various types of 

scheduling systems, hand-held, U.S. Pat. No- 4,117,542, 
desk-top, U.S. Pat. No. 4,548,510, and through 
communication lines, U.S. Pat. No. 4,783,800. Of these, 
the present invention relates most closely to the hand-held 

20 device. The device disclosed in U.S. Pat. No. 4,117,542 is 
intended for storing and retrieving telephone numbers, 
street addresses, appointments, and agenda. It has the 
option of offering normal calculator functions as well. 
The main difficulty with this device is that it is limited 

25 to the laborious task of manually typing in messages and 
other information on the keypad. Entering information by 
means of typing on the keypad has many disadvantages. The 
most obvious is the time requirement, especially for 
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individuals not adept at typing. Another disadvantage is 
that messages cannot be entered while being engaged in a 
task that does not permit use of one's hands and/or vision, 
such as, while driving a motor vehicle. Yet another 
disadvantage is the limitation that only information which 
can be expressed by the characters on the keypad can be 
entered and stored in the device (so that Chinese and 
music, for example, cannot be entered on a keypad with 
English characters) . 

To alleviate these difficulties, various 
approaches have been taken, such as handwritten character 
recognition and voice input. Handwritten character 
recognition is disclosed in U.S. Pat. No. 4,276,541 with a 
device that has a designated area on its face where the 
15 user can write a message down. The device embeds 

algorithms that proceed to decipher the writing and store 
it in data format. Of course, this approach does not meet 
our objective of reduced manual input means. 

Alternatively, we choose voice as a means for 
20 entering information. Voice, or audio, has the advantage 
of requiring only limited manual contact with the device 
and eliminates the continual hand-eye coordination demanded 
by a typing or writing function. In addition, audio 
recording permits any language or sound to be recorded. 
25 Prior art in voice recording and reproducing for use in 
wrist watches appears in U.S. Pat. No. 4,391,530, where 
voice is coupled to alarm time entries to render context to 
the alarm times. An additional feature of a confirming 
external switch is disclosed in the U.S. Pat. No. 
30 4,405,241. However, the design size constraint of both of 
these designs compromises the extent of memory storage and 
the number and types of functions for data manipulation. 

Though using voice as an input means reduces the 
extent of manual input, some manual input of information 
35 and functional operation is still necessary. By using 

voice in order to control a device, the simplicity of use 
is even further advanced and, in some instances, manual 
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demands are completely eliminated. Voice control also 
alleviates the process of learning to operate the device. 

Prior art in voice control of a small time 
keeping device is found in U.S. Pat. No. 4,635,286, which 
5 discloses voice control for a wristwatch. In the proposed 
approach, several difficulties are evident. First, space 
constraints of a wristwatch considerably limit memory size 
and computing power and therefore the vocabulary size and 
the number of functions that can be controlled. Second, 

10 the described voice input means make the voice recognition 
function a very difficult task because the beginning and 
ending boundaries of the spoken voice command input are not 
easy to extract. Third, the reference command words are 
prerecorded and assumed to be speaker- independent . And 

15 fourth, little attempt is made at recording messages (more 
than single words) and the proposed method of recording 
alarm times is cumbersome and requires continuous visual 
and manual interaction with the device. 

An additional device and method are proposed in 

20 U.S. Pat. No. 5,014,317. In this device, the size is no 
longer constrained to that of a wristwatch and the 
functionality includes message recording coupled with alarm 
time settings. In one of the embodiments of this patent, 
both message and alarm time settings are entered through 

25 voice means. A word spotting algorithm is used to isolate 
the time and date information in order to set the alarm. 
The remainder of the message is not recognized, but rather 
stored for later retrieval. In the second embodiment of 
the same patent, name data is recognized from vocally input 

30 information. This approach to data entry, whereby alarm 
time and date information or name information is extracted 
from a sentence is very difficult and prone to easily 
occurring errors. To date, recognizing even single words 
(unconnected) from a limited vocabulary is a difficult 

35 problem when implemented in the real world environment with 
background noise and variances in any one person's speech 
patterns. For example AT&T' s DSP16 speaker-dependent 
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series of voice recognition components has a success rate 
of 95% for unconnected words that are user trained to 
recognize only the trainer's spoken words. The problem is 
compounded when the system is designed for speaker- 
5 independent applications, where speech pattern variations 
from speaker to speaker must be accounted for as well. 

In the method proposed in U.S. Pat. No. 
5 014,317, both speaker-independent voice recognition and 
word-spotting are employed, thus requiring the use of 
10 speaker-independent recognition algorithms and the added 
complexity of locating a particular word in a string of 
spoken words. In addition, no attempt is made at 
controlling the device using the spoken words. 

Accordingly, a primary goal of the present 
15 invention is to provide a device in which use is made of 
voice entry to not only enter messages and alarm times but 
also to control functions of the device, such as switching 
between modes of operation (message entry and alarm time 
entry) . The proposed method of accomplishing this goal 
20 removes the need for visual contact as well as reducing 
memory and voice recognition requirements. 

SUMMARY OF THE INVENTION 

This invention is intended to offer a means of 
keeping a daily schedule/ agenda in a simple and easy to use 
25 fashion. Messages and appointments are stored by either 
recording an audio signal (e.g. voice) or by typing 
manually on an alphanumeric keypad. The device is fully 
operational both by voice commands means or through a 
keypad • 

30 An electronic scheduler in accordance with the 

present invention comprises: (a) a real time clock, 
comprising means for keeping current time and date; an 
alarm time register; an alarm date register; means for 
identifying a match between the current time and a set 

35 alarm time stored in the alarm time register; means for 
identifying a match between the current date and a set 
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alarm date stored in the alarm date register; and means for 
outputting an alarm time reached signal or prerecorded 
message or sound when the current alarm time matches the 
set alarm time and the current alarm date matches the set 
5 alarm date; (b) a random access memory (RAM) for storing 
units of compressed digital audio data defining a message; 
(c) an audio storage/retrieval processor comprising: a 
microphone for receiving audio signals; amplifier means for 
amplifying the audio signals; a first low pass filter; an 
10 A/D converter for converting the amplified audio signals 
into digital audio data; data compression means for 
compressing digital audio data from the A/D converter into 
compressed digital audio data to be stored in the RAM; 
means for retrieving digital audio data from the RAM; means 
15 for expanding the retrieved data; means for D/A converting 
the expanded data; means for filtering the D/A converted 
data; means for amplifying the filtered D/A converted data 
to reproduce the original input audio signal during audio 
playback; (d) addressing means for assigning addresses to 
20 the units of compressed digital audio data, the addresses 
corresponding to storage locations in the RAM at which the 
units of data are stored; (e) keypad means, comprising 
alphanumeric keys and function keys, for entering text 
information and alarm time and date settings; (f) display 
25 means for displaying information retrieved by the 
addressing means; (g) voice synthesis means for 
synthesizing audible speech, including means for 
synthesizing an audible indication that the electronic 
scheduler is ready to accept a message to be stored, and 
3 0 for synthesizing an audible readout of text entries entered 
through the keypad means; and (h) secret option means for 
entering protected information for limited access. 

In one preferred embodiment of the present 
invention, the real time clock comprises an oscillator, a 
3 5 time counting unit incrementing continuously based on a 
reference signal provided by the oscillator, a date 
counting unit, and means for making a periodic comparison 
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between the alarm time stored in the alarm time register 
and the current time provided by the time counting unit. 

in addition, an electronic scheduler in 
accordance with the present invention Bay comprise maans^ 
, fo r initiating a programed seguence that sounds an alar, 
or plays back a recorded message in response to the alarm 

time «^/ f ^- embodlMnts ma y also comprise means for 

marking selected memory addresses with an alarm time such 
marking played back or logged into 

0 that corresponding data is to »P'l olaved 
a scheduling network along with other messages to be played 

baCk ' Preferred embodiments may advantageously include 

means for storing context information packets "™>£* 
S with selected messages, the context information bleating 
the time the associated message was entered,- alarm time 00 
associated with the associated message; the number of times 
the message has been played; and a date on which the 
message may be automatically erased. 

" An electronic scheduler in accordance with the 

present inventicn may also comprise a read only memory 
^OM, containing prestored digitized comman ds. »d means 
for audibly reading out commands appearing on the display 
means by extracting the prestored digitized commands from 

" ^ R0 "' The addressing means may include a microprocessor 
or a digital signal processor controlling information flow 

between all components. 

in addition, preferred embodiments may include a 
30 read only memory (ROM) storing firmware, application 
3 0 reaa i m-erecorded voice message 

software, screen message data and prerecoraeo. 

data * Preferred embodiments may also include means for 

grouping entered information into data groups where a group 
35 Tan include name, telephone number, address, and message 
and search logic means for retrieving from memory all the 
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stored information in a group when only a portion of the 
information is provided. 

In addition, an electronic scheduler in 
accordance with the present invention may include: (i) 
5 speech patterning means for extracting identifying 
parameters from the digital audio data; (j) speech 
recognition means for comparing the extracted identifying 
parameters to each of a group of reference identifying 
parameters associated with a first reference vocabulary, 

10 and producing a match indication as a function of the 
comparing; (k) command logic means to effect the 
performance of predetermined functions of the electronic 
scheduler upon receiving the match indication; and (1) 
interactive speech control means for controlling the 

15 interaction of the command logic means with the voice 
synthesis means such that the voice synthesis means 
synthesizes prompts indicating when speech commands are to 
be input and which options are available at a given 
instant. 

20 The first reference vocabulary may be either 

factory installed and speaker- independent or be created 
through a training process with spoken utterances or sounds 
by extracting from the utterances or sounds identifying 
parameters and storing the identifying parameters as the 

25 group of identifying parameters for the first reference 
vocabulary. 

The speech recognition means may include means 
for producing a nonmatch indication when the match 
indication does not result from the input of a given spoken 

3 0 utterance, the nonmatch indication indicating that the 
given spoken utterance was not recognized - 

In preferred embodiments, the predetermined 
functions include: turning on and off the electronic 
scheduler, retrieving specified stored information, setting 

35 the alarm time associated with a particular recorded 

message, and setting a secret code for limited data, access. 
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Preferred embodiments may also include: (m) a 
second reference vocabulary containing time and date 
information for use by the speech recognition means to 
extract time and date information from the extracted 
5 identifying parameters; (n) alarm time logic means to allow 
entry of an alarm time including time of day and date upon 
the speech recognition means producing a match indication; 
and (o) interactive speech recording means for controlling 
the interaction of the command logic means with the voice 
10 synthesis means, whereby the voice synthesis means 

synthesizes speech or sound prompts for indicating the 
required delivery time of an audio message input and which 
options are available at a given instant. 

In addition, an electronic scheduler in 
15 accordance with the present invention may comprise means 
for audibly confirming the content of the information 
entered by the speech recognition means. 

Thus, according to the invention, audio input is 
converted from analog to digital and stored in random 
20 access memory (RAM) for later retrieval. Unlike recording 
an audio signal on tape media, digital memory storage 
offers the control integrity and access that is necessary 
for a scheduling/agenda system. Other digital mass storage 
devices can be used either as a replacement to or in 
25 addition to the RAM, such as, optical or magnetic disk 

drives. For example, to be able to record "Call Joe at 555 
1212" and have this message alert one to this task shortly 
before it must be executed requires that one be able to 
program an alarm and have this particular message ready for 
30 play at that time instant. Any audio input can thus be 
automatically incorporated into the scheduling/ agenda 
system along with typed in information. The digitized 
audio information simply receives a different storage 

location in the memory. 

The use of an audio input rather than keypad 
entry is an option that is important for many reasons. The 
foremost advantage is the ease and expediency in which a 



35 



WO 94/18667 



PCT/US94/01597 



- 9 - 

message can be recorded. One of the main reasons 
electronic organizers have only captured a small sector of 
the overall market is attributed to the enormous patience 
required to enter information into the device. Since the 
majority of the market to which it appeals are busy 
individuals, this is a significant deterrent. 

Another advantage the audio input offers is in 
lowering the level of the user's required sophistication 
and familiarity with technology. 

Yet another advantage of the audio input is that 
information other than what can be expressed as 
alphanumeric characters can be recorded , such as music or 
an individual's voice. 

The output of the device encompasses both visual 
display and audio output. The display shows previously 
entered messages, messages in the process of being typed 
in, commands, functions and more. When an audio message is 
searched and found the display will indicate something to 
the effect of "Audio Information, press <PLAY> to listen." 

The audio output is achieved by accessing the 
particular block of data stored in the memory, passing it 
through a digital to analog converter, filtering it, 
amplifying it and outputting it through the speaker. The 
audio playback can be halted at any instant, played back 
repeatedly or saved for future reference. 

In addition, there is an audio output that reads 
out the commands appearing on the visual display. This is 
especially useful when visual contact with the device is 
limited (e.g. , when driving) . Audio commands and readouts 
are accomplished by extracting prestpred digitized commands 
and digits from the ROM concurrent with the display of 
these commands on the display. 

The scheduling aspect of the device offers a 
method of logging into and retrieving from memory telephone 
numbers, addresses, appointments, meetings, and other 
information and daily activities. These entries can be 
classified into user defined or factory predefined 
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categories (e.g., personal entries, business entries). 
Schedule inquires can be made by date, by time, or by any 
key-word present in the stored information. 

A programmable alarm is coupled with the 
scheduling aspects of the device. The alarm time, the time 
the alarm turns on, can be appended to any of the entries 
made, be it an appointment, a meeting, a phone call that 
must be made at a specific hour or any other alarm related 
need. Alarm times can be appended to audio inputs as well, 
extending the utility of the audio input in an important 
way. For example, an individual can quickly leave himself 
a note to remind himself of a task to be performed by 
simply speaking into the device and then keying in the hour 
for the alarm to turn on. An option is also available 
whereby the actual message entered will be played back 

instead of an alarm. 

The option of having a message automatically 
played back instead of an alarm broadens the use to such 
things as games, a fun programmable alarm clock that wakes 
20 one up to whatever tune or message that was audibly 

entered, a message one individual leaves for another, and 



15 



25 



30 



35 



more 



The search capability offers access to stored 
entries by providing only a portion of a particular entry 
to be found.' The method used to search is what is known in 
the field of artificial intelligence as a "top-down" 
search. This involves first searching all the name fields, 
then all the telephone number fields, then all the address 
fields, then all the message fields, and finally all the 
search index fields. The first item found that satisfies 
the search is displayed. The system continues to search 
through additional fields to locate another match. If the 
appropriate key is pressed, the system displays the next 
match found, until a message "search complete" is displayed 
to announce that the entire memory has been searched, and 
all matching fields have been found. In the case where 
additional matches are requested and the system is still in 
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the process of searching, the message "searching" will 
flash on the screen to indicate such a status. 

In order to ensure the security of the 
information entered by the person using the device , an 
5 option is available that limits access to prespecif ied 
entries. A key labeled "SECRET" can be pressed before 
entering information so that the information to be 
currently entered can only be accessed by knowing a code 
word. The code word is programmable and can be changed at 
10 will. 

Voice operation of the device is available 
through user spoken commands. Before entering a command, 
the device provides an audio prompt to indicate when the 
command should be spoken and what command options are 
15 available at that instant. For example, after entering a* 
message through the audio means, a reminding alarm is set 
at a specific time at which the message will be played 
back. After entering the message, the system will prompt 
the user by announcing: "alarm ?". By saying "No" the 
20 entry is complete and no alarm time has been appended. By 
saying "Yes", the system prompts: "hour ?"; the user then 
says one word indicating the hour. The system then 
prompts: "minute ?"; the user then says two additional 
digits specifying the minute. The system then prompts: "AM 
25 ?"; the user then says "Yes" for AM and "No" for PM. The 
system then prompts: "done ?". By saying "Yes" the entry 
is complete; by saying "No", additional prompts are 
presented. Note that for this small example a vocabulary 
of only 15 words is necessary, i.e., Yes, No and thirteen 
30 digits (0, 1, ... ,12) . 

The main objective of this device is to use voice 
as a means for entering information, such as appointments, 
which will offer a user a more efficient and less demanding 
mechanism for maintaining a schedule. In addition, the 
35 present innovation introduces a method of exploiting voice 
recognition for controlling the device's functionality. 
B cause the dominant constraint of the device is its size, 
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US e of -any sophisticated voice recognition ^ " 
demand high computational power, are P'^^J^ 
proposed method, however, offers a unique and practical 
solution by which more simple, portable and less 
5 computationally demanding voice recognition algorithms can 

be taken advantage of. 

To this end, this specification also describes, 
in detail, one of many possible hardware implement ation. of 
^ proposed objectives. The design pays 
10 to power consumption, memory backup features memory 

management and voice recognition error minimization^ Power 
consumption is an important consideration 
portable usage, since voice synthesis, recording and 
playback components consume a relatively large amount of 
15 power. Storage of audio data in digital form reguires 

relatively large amounts of memory and so memory management 
is vital through data compression and automatic erasing 
features. And, with audio confirmation and automatic 
rejection of poorly received voice inputs, a means for 
20 reducing recognition errors, at little computational 
burden, is effected. 

BRIEF DESCRIPTION OP DRAWINGS 

FIG. 1 is a block diagram of one embodiment of a 
voice controlled appointment keeper according to the 

25 present invention. . 

P fig . 2 depicts the audio storage/retrxeval 

processor 11 portion of the block diagram shown in FIG. 1. 

FIG . 3 depicts the circuitry for the keypad 
interface component 31 shown in FIG. 1. ^ 

FIG 4 depicts the circuitry for the real-txme- 

clock component 5 shown in FIG. 1. 

FIG. 5 depicts the circuitry for the power 

control component 3 shown in FIG. 1. 

FIG. 6 is a flowchart of the voice recognxtxon 

35 and training seance used to "^/^^T^T 
function execution and spoken informatxon for data 
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this algorithm is implemented in software and is located in 
the voice recognition processor 43 in FIG. 1. 

FIGS. 7A, 7B and 7C are flowcharts for the 
interactive voice control and voice information entry 
5 dialogue between the device and the user. 

FIG. 8 is an example of interactive dialogue 
between the device and the user for entering an alarm time, 
corresponding to block 136 in FIG. 7A. 

FIG. 9 is one possible outer appearance design of 
10 the device for handheld size. 



DETAILED DESCRIPTION OF ONE PREFERRED EMBODIMENT 

Reference is made to FIG. 1 showing the key 
building blocks of a device suited for carrying out the 
present invention. Component interconnect ivity is 
15 specified by lines; arrows designate direction of 

information flow and no arrows indicate bi-directional 
flow. 

Two means of inputting information into the 
device are available, by speaking into the microphone 29 or 

20 by keying in on the keypad 33. The information entered 
into the device is conveniently divided into two types of 
information. The first is "control information", intended 
to control the device in performing such functions as 
playback, search the data base, set alarm time and the 

25 like. The second type is "message information" which 

includes telephone numbers, names, notes and the like, that 
is usually intended for storage and retrieval purposes. 
Voice recognition is performed mostly on the control type 
information, while the audio inputted message type 

30 information is limited to a stored digitized audio message. 
Voice recognition can, however, be used to input some types 
of message information, such as phone numbers, that are 
stored in text format, while the name and other information 
associated with that message information can be entered and 

35 retrieved as message information. All typed text 
information (names, numbers, addresses, memos, and 
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functional commands) entered through the keypad provides 
both control and message type information. 

The voice input undergoes the following 
processing before being ready for storage in the device's 
memory. Audio range frequency signals enter the Microphone 
29 where they are transformed into electrical signals and 
transferred to the Audio and Storage Retrieval Processor 

Refer to FIG. 2 for a detailed description of the 
Audio and Retrieval Processor 11 in FIG. 1. The electrical 
signals from the microphone are amplified by the input 
amplifier 50 (FIG. 2) to raise the signal level, and then 
passed through a low pass filter 51 to remove aliasing 
frequencies. The amplified anti-aliased audio signal is 
then digitized by an A/D converter 52, which converts the 
analog signal into a binary representation capable of being 
stored, retrieved, and manipulated by digital hardware. 
Compression of the binary representation is then performed 
by the data compressor 53 in order to increase the amount 
of recording time available for a given digital memory 
size. It is possible to achieve the same results using 
software to perform the data compression. However, the 
functional block is represented as hardware to show 
functional necessity. When the original audio signal is to 
be reproduced, the opposite operation, expansion (done in 
the data expansion block 54), must be performed on the 
stored compressed binary representation. The data rate 
(amount of information (in bits) required per time to 
effectively represent the signal that is to be reproduced) 
is a limiting factor in the amount of recording time 
available for a given digital memory size. ■ 

There are several types of compression available 
to reduce the effective data rate without sacrificing 
signal integrity beyond intelligibility. One of these 
methods is Adaptive Differential Pulse Code Modulation 
(ADPCM), an algorithm capable of 2:1 to 8:1 compression 
that was developed primarily for speech data compression 
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ver telephone lines. There are several commercially 
available integrated circuits that perform this compression 
and expansion in hardware. In addition , there are current 
offerings that incorporate the microphone amplifier, 
5 filters , A/D converter, D/A converter, data compressor, and 
data expander all on the same substrate. Other compression 
techniques can be used in addition to, or as a replacement 
of, ADPCM. Some algorithms are capable of compressing 
random binary data at ratios of 1.5:1 to 10:1 (depending on 
10 the redundancy factor of the data) . Through the combined 
use of more than one compression algorithm, the data rate 
can be reduced to yield longer recording times without 
expanding memory resources or significantly affecting audio 
sound quality. 

15 The compressed data is stored in the RAM 9 (FIG. 

1) for later retrieval. Each converted input binary 
sequence defining a message is assigned an address so that 
it can be retrieved at any time, marked with an alarm time 
to be played back when the alarm time sets off, or logged 

20 into the scheduling network along with other typed in 
messages. To effectively address the memory, a data 
"header" or "footer" is stored with each message to 
indicate the message length or alternatively , an "end of 
message" or "beginning of message" sequence is stored with 

25 each message to define its memory location. An alternative 
and more restrictive addressing method involves reserving a 
portion of memory where a table listing the addresses of 
the messages is located. In addition to this storage 
addressing information, a context information packet is 

30 also stored with the actual message. The context 

information contains the time the message was entered; the 
alarm(s) , if any, that are associated with the message; the 
number of times the message was played; a date by which the 
message may be automatically erased; and any other 

35 information (including a text reference message) that may 
be used to control or track the message. 
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Inputting information via the keypad is possible. 
Th re are alphabet keys, numeric keys, and control keys 
(See FIG. 9) . Pressing the alphanumeric keys serves the 
purpose of entering alphabetic letters and numbers as 
5 indicated by the labels closest to each alphanumeric key. 
Function keys are also available to execute such operations 
as: search by various categories (e.g. telephone number, 
first or last name, profession) , calculator functions, 
playback voice recorded message function, name entry, 

10 telephone/facsimile number entry, address entry, user 
defined search item entry, and memorandum entry. These 
same functions and others can also be accessed through 
screen selection prompts whereby the user selects the 
number corresponding to a particular function displayed on 

15 the screen. When a key is pressed on the keypad, the 

keypad interface 31 (FIG. 1) senses it within the time it 
periodically samples the keypad for a "keypress" (-0.1 
seconds is a common period) . A binary code representing 
which particular key has been pressed is passed to the CPU 

20 37 where the preprogrammed operation for that keypress is 
executed . 

Refer to FIG. 3 for a detailed description of the 
keypad interface 31 which is used by the system to find out 
which key, if any, is being pressed on the keypad 33. This 

25 particular implementation uses a row-by-row decode 

technique. The CPU periodically selects a row to be read 
through a latch 70. This latch is also used to control 
other system functions (such as volume control in this 
application) if there are left over outputs. The latch 

30 data is decoded by a decode circuit 71, a data selector, 

and the selected row is read by selecting another latch 73. 
The data read by the CPU is a bit pattern of the keypad, 
such that any key can be checked individually for a 
depression. The ON key is decoded separately as to provide 

35 a switch that works without the CPU. This is needed to 
provide a user input for turning on the device during 
power-down mode. A latch 74 is used to read this key 
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during power-on as well as other signals* This circuit can 
be implemented in many different ways, and is shown here to 
provide functional completeness. The essential 
functionality afforded by this circuit is the sensing of 
5 keypresses . 

There are three ways that information is 
presented to the user: first, as an alphanumeric character 
or picture on the display; second, the same as the first 
method but in addition a synthesized voice that audibly 

10 reads out whatever is displayed; or third, playback of the 
user's recorded message. 

If the information was entered by means of the 
keypad, or audibly through voice recognition means, the 
first and second output means are possible, i.e., by 

15 display means on the face of the device and/or by 

synthesized voice means. When the user desires to hear a 
synthesized voice output of the display, each alphanumeric 
character that is displayed also triggers the transmission 
of a particular prestored digitized acoustic sound. The 

20 sequence of these sounds, each sounding out one character, 
produces the sound of the complete word/number. These 
digitized acoustic sounds are prestored in the ROM 7 (FIG. 
1) . There are other ways of producing the synthesized 
voice output and many commercial packages are readily 

25 available (e.g. AT&T DSP16 series) (See further discussion 
below on voice synthesis) . 

If the information was entered by speaking into 
the microphone as "message inf ormation" (i.e., no 
recognition operation) , only the third means of output is 

30 possible (i.e. playback, of the user's recorded message) . 
The user is alerted to the fact that this is the only 
possibility by a message that appears on the Display 15 
(FIG. 1) (e.g., "voice message - press <PLAY> to listen). 
By pressing a particular key, the recorded message is 

3 5 played back. 

Th operation by which the audio information is 
presented to the user is shown in FIG. 2, which depicts the 
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audio storage/retrieval processor 11 of FIG. 1. As shown 
in FIG. 2, outputtihg audio information is accomplished by 
first addressing in the RAM the particular memory block 
that is associated with the message to be heard. Then the 
data is decompressed in the data expansion unit 54 into 
binary data for the D/A converter 55. The D/A Converter 
converts the digital signal to a sampled audio signal, 
passes it through- a Low Pass Filter 56 to remove unwanted 
harmonics, and then passes it through an Amplifier 57 which 
drives a Speaker 25. Once the message is heard, it can be 
deleted or saved in memory, replayed, tagged with a new 
alarm for future referencing amongst other options. 

The LCD (liguid crystal display) Display 15 (FIG. 
1) in this example device is a text or graphic display for 
15 providing information to the user. Information such as 

status, options, or recorded information (previously typed 
into the unit) can be shown on the display. The display is 
controlled by the CPU as defined by the firmware stored in 
the ROM. 

20 T he CPU 37 (FIG. 1) controls the information flow 

and operational functions of the unit. At each CPU 
instruction cycle, an instruction is received from the ROM. 
The CPU then transfers data between itself and an external 
device or between two external devices. The CPU operation 
25 can be interrupted periodically to handle maintenance 
functions such as reading the keypad for key presses, 
checking to see if any of the alarm settings has reached 
its term (i.e., comparing the stored alarm times with the 
current time) , checking to see if the "ON" button is 
30 depressed or if the main battery level is low. The 
Oscillator 13 (FIG. 1) provides a time base for the 
internal functions of the CPU. As shown in FIG. 1, this 
same time base may be used as a reference for the audio 
storage/ retrieval processor 11. 

The address decode circuit 35 shown in FIG. 1 
controls the CPU's access to the peripherals or devices of 
the system. This unit effectively divides the address 
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space of the CPU into portions big enough for the 
individual peripherals or devices. 

The read only memory (ROM) 7 shown in FIG. l 
contains the firmware (hardware drivers) , application 
5 software, screen message data, pre-recorded voice messages 
(user alerts, etc.), and other data required for operation 
of the device. its function is to provide preprogrammed 
instructions for the CPU. At each instruction cycle of the 
CPU, the ROM receives a binary address from the CPU, and 
10 presents the data corresponding to that address. 

The random access memory (RAM) 9 shown in FIG. i 
stores text and voice data along with some statistical 
information about the data. The information contained 
within the RAM can be accessed by the CPU at any time, and 
its organization is entirely dependant upon the software 
driving the system. Additional digital mass storage 
devices can be used, such as optical or magnetic disk 
drives . 

The real time clock 5 shown in FIG. 1 stores and 
counts the time of day, day of the week, day of the month 
month, and year with the use of an oscillator. The alarm 
function of the real time clock is used to wake up the unit 
durxng stand-by mode when one of the alarm times that were 
set matches the current time of the clock. In addition 
the periodic interrupt function of this component is used 
to provide the CPU with interrupts at regular intervals in 
order to scan the Keypad or check the alarm times, etc 
The input to the real time clock is primarily for setting 
the clock time and for setting the alarm time. 

Refer to FIG. 4 for the circuitry of the real 
time clock 5 in FIG. 1. The time counting unit 91 and date 
counting unit 92 increment continuously based on the 
reference frequency provided by the oscillator 90. This 
counting continues even during stand-by mode (power-off) 
The current time and date for these units can be set by the 
CPU at the user's discretion. The time and date set and 
updated in these units are the source from which the CPU 



20 



PCT/US94/01597 



WO 94/18667 



- 20 - 



10 
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simple momentary switch, and the reaching of an alarm time. 
Flip-flop 101 is SET by the alarm signal 98 (FIG. 4) or the 
ON signal 76 (FIG. 3). The output of the flip-flop 
controls the DC/DC converter 102, and turns it on or off, 
5 which applies or disconnects the power to the circuit in 
rder to conserve battery usage while the unit is not in 
use. The flip-flop is turned off by the time-out of the 
one Shot 103 (retriggerable) which is caused by the absence 
of keypad scans. The keypad scans can be stopped 
10 purposely, or by a CPU failure. The low battery detect 

circuit 105 provides the system with a signal (low battery) 
that indicates a battery voltage lower than a predefined 
value. This indication can be used to alert the user or 
shut down the system during an out-of -tolerance condition. 
15 The reset circuit 104 provides the system with a momentary 
set-up time after power-on to allow for oscillator 
stabilization and low level hardware initialization, and is 
also necessary for the device start-up (power on) . The 
reset function is built into some CPUs, and is therefore 
20 shown only for functional completeness. This power method 
is replaceable with a simple switch, or can be implemented 
without a DC/DC converter. 

The external interface 16 shown in FIG. l is for 
the purpose of transferring the memory contents to a 
25 personal computer or to provide additional memory to the 

device. This interface is a means by which the information 
stored within the device can be archived by the user 
through downloading the data to a personal computer or 
storing in external removable non-volatile memory. 
30 Alternatively, this interface can be used to upload 

information into the device's memory (e.g. voice mail) . 
The expansion of storage capability is also facilitated by 
this interface for extending the available audio recording 
time and text storage capacity of the device. 
35 Voice recognition of spoken command and spoken 

data information is performed in the following manner. As 
FIG. 6 shows, the A/D converter 52 (FIG. 2) converts a 
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spoken utterance into a digital data stream, passes it 
through the feature extractor 110 where the "identifying 
parameters- of the data stream are removed. Each reference 
word in the vocabulary has identifying parameters that 
distinguish it from the other words in the same vocabulary. 
The feature extractor is followed by the edge detector 111 
which locates the point in the recorded time frame where 
the particular utterance actually began and ended (there 
ma y be some instant of delay before speaking the word) and 
uses those points as the points of reference for the 
beginning and ending of the relevant identifying 
parameters. Ascertaining the beginning and the ending 
boundaries of the utterance is an important element to 
forming identifying parameter sets that are comparable to 
those in the reference vocabulary. A parameter set formed 
from a time shifted version of the same uttered word, or an 
expanded or compressed version of the same uttered word, 
can lead to very different identifying parameters, and 
consequently, an error in the classification of the 
utterance. 

The identifying parameters extracted from the 
incoming spoken utterance are compared 112 to each group of 
identifying parameters belonging to each of the words in 
the reference vocabulary 117 and a measure of distance is 
made for each one (e.g., the Hamming distance for binary 
code words). The comparison that produces the best (e.g., 
the smallest) distance is considered to be the best match, 
and its measured distance is then compared to a defined 
threshold 114 to ascertain whether this best match is close 
enough to be considered the same. The final decision 115 
is reached as follows: if the best distance measured does 
not pass the threshold, the utterance is not considered to 
match any of the words in the reference vocabulary and the 
final decision is to request a retry (that the same 
utterance be produced again for another attempt at 
recognition) ; if best distance measured does pass the 
threshold, the utterance is considered to be that word 
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systems that are capable of recognizing the spoken 
utterances independent of which speaker spoke them. This 
is usually accomplished by having a large number of 
speakers utter the words in the defined vocabulary. Each 
5 speaker will produce identifying parameters for each of the 
words in the defined vocabulary that are slightly different 
than those produced by any other speaker. The different 
identifying parameters produced by each of the speakers can 
then be averaged and used as the "speaker- independent 
10 identifying parameters". 

The voice output for the purpose of sounding 
information and prompts (e.g., data required from the user) 
to the user is shown as the voice synthesis processor 12 
block in FIG. 1. The functionality necessary for this task 
15 is that required for storage of voice patterns, such as the 
audio message storage means, and that required for audio 
playback, such as said audio retrieval means 11 (FIG. 1 or 
FIG. 2). Following the audio storage means in FIG. 2, the 
voice synthesized data patterns, after data compression 56, 
20 are transferred through the CPU data bus to the ROM for 
storage, along with the other data necessary for system 
operation. output of the voice synthesis operation is 
initiated in different situations: it can be automatic when 
reading the display output, if such an option is selected 
25 by the user;' it can be part of a dialogue (user prompts) 
for voice or keypad control command entry; it can be part 
of a dialogue for voice or keypad data information entry; 
or other situations, such as sounding of voice alarms. The 
times at which this occurs are controlled by the software 
30 of the system. The data for producing the intended words 
or sounds is read from the ROM and sent through the CPU 
data bus to the audio storage/retrieval processor 11 (FIG. 
1 or FIG. 2) where it is expanded 54 (FIG. 2), D/A 
converted 55 (FIG. 2), filtered 56 (FIG. 2) and amplified 
35 57 (FIG. 2) for list ning by the user. 

This method for voice synthesis is accomplished 
without additional hardware requirement. However, the 
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drawback to this method is a significant increase in ROM 
size needed to store the extensive data for the 
vocabulary's speech patterns. The use of a separate voice 
synthesizer employing allophone, phoneme, LPC, or other 
5 comparable methods, would reduce the storage requirement of 
the needed vocabulary at the expense of additional 
hardware. With this approach, the memory requirement is 
reduced because only the pointers to the pattern sequences, 
needed to reproduce the intended utterances, are stored. 
10 Alternatively, the patterns for the allophones or phonemes 
can be stored in the system ROM, along with the vocabulary 
look-up tables (where the needed allophones or phonemes for 
each word are listed) . The particular implementation would 
dictate the necessity of additional hardware for the voice 
15 synthesis requirement. 

In order to enter commands or enter data 
information into the device through audio means, an 
interactive method is devised. This interactive method 
employs the voice synthesis means and the voice recognition 
20 means, both discussed above, in the following fashion. 
Refer to FIG. 7 for a flowchart of one example of a 
software implementation of the interactive method. The 
recognition process can be initiated by either engaging an 
external switch 130 or entering an audio message 160. 
25 (Other uses for the voice recognition system includes voice 
recognition training 116 (FIG. 6).) Using the external 
switch to initiate recognition, the device prompts the user 
through voice synthesis means to enter a command by 
sounding "ENTER ALARM". The user says "ALARM" 135, 
30 "SCHEDULE" 138, "NAME" 148 or "VOLUME" 151 and the device 
uses the voice recognition means to classify the uttered 
command to one of these possible commands in its 
vocabulary. A typical dialogue for information entry is 
given in FIG. 8, which illustrates an example of alarm time 
3 5 entry 135 dialogue. 

The interactive method effectively reduces the 
vocabulary size for voice recognition and therefore also 
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Finally, many modifications and variations of the 
exemplary embodiment specified herein will fall within the 
true scope of the present invention. For example, the 
interactive method of controlling the functionality of the 
device and entering messages into its memory banks, is also 
applicable to larger computer based systems. And, it is 
especially helpful in applications where manual or visual 
contact with the device is limited. In such cases the 
applications may not be intended for scheduling, but 
rather, for information entry and functional control in 
general. In addition, it should be made clear that the 
audio entry for message storage and the audio entry for 
voice recognition may be different hardware or software 
processes depending on the particular recognition algorithm 
used. Yet another variant of the present invention would 
include a personal computer link for down loading inputted 
information, for loading into the device information 
entered through a personal computer, or for nonvolatile 
memory expansion. Such a link could of course be used for 
facsimile transmission of information if it was needed. 
Accordingly, the scope of protection of the following 
claims is intended to be broad enough to cover all such 
modifications and variations. 
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W claim: ^ ^ electronic scheduler, comprising: 

(a) a real time clock, comprising means for 
keeping current time and date; an alarm time agister; an 
Keeping ^antifvina a match between 

; alarm date register; means for identifying 

4. — a set alarm time stored in the alarm 
the current time ana a set «*j.eu.» 

the curre identifying a match between the 

0 signal when the current alarm time matches the set alarm 
time and the current alarm date matches the set alarm date, 

(b , a memcry for storing units of compressed 
digital audio data defining a message; 

(c) an audio storage/retrieval processor 

5 operatives coupled to said memory, comprising: a 
Aerophone for receiving audio signals; amplif ier 
operative^ coupled to said microphone for amplifying said 
audio signals; an A/D converter for converting the 
amplified audio signals into digital audio data; 

,0 compression means for compressing digita --c data from 

said A/D converter into ^ZTJriZ^T^ audio 
stored in said memory; means for retrieving y 
fata from said memory; means for expanding the "trx.ved 
fata; means .for D,A converting the expanded data; 

25 filtering the »,» converted data; means for 

filtered D/ A converted data to reproduce the original inp 
audio signal during audio playback; 

«i) addressing means, operatively coupled to 
said memory and said audio ^o M ^i^ ^^r, for 

30 assigning addresses to said units of compressed digital 

:::: jl. r , 

Xocations in said memory at which said units of data 

SMred '' (e, keypad means, comprising alphanumeric keys 
35 and function 'keys, for entering text information and alarm 
time and date settings; 
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(f) display means for displaying information 
retrieved by said addressing means; 

(a) voice synthesis means for synthesizing 
audible speech, including means for synthesizing an audible 
5 Indication that the electronic scheduler is "^^to^-pt 
a message to be stored, and for synthesizing an audible 
readout of text entries entered through said keypad means. 

2 An electronic scheduler as recited in claim 
1. wherein said real time clocX comprises an oscillator, a 
10 time counting unit incrementing continuously based on a 

reference signal provided by said oscillator, and means for 
m aKing a periodic comparison between the alarm time stored 
in the alarm time register and the current time provided by 
the tine counting unit. 

3 . An electronic scheduler as recited in claim 
2. further comprising means for initiating a programmed 
sequence that sounds an alarm or plays mac* a 
message in response to said alarm time reached signal. 

4 An electronic scheduler as recited in claim 
2 0 1, further comprising means for marking " S £" 

with an alarm time at which time corresponding data is to 
be played back or logged into a scheduling network along 
with other messages to be played back. 

5 An electronic scheduler as recited in claim 
25 i. further comprising means for storing context formation 

packets associated with selected messages, said context 
'information indicating the time the associated message was 
entered,- alarm ti-e,s, corresponding to the 
message; the number of times the message has M.^ 
30 and a date on which the message may be automatically 
erased. 
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6. An electronic scheduler as recited in claim 
1, further comprising a read only memory (ROM) containing 
prestored digitized commands, and means for audibly reading 
out commands appearing on said display means by extracting 

5 said prestored digitized commands from the ROM. 

7. An electronic scheduler as recited in claim 
1, wherein said addressing means comprises one member of a 
group including a microprocessor and a digital signal 
processor controlling information flow between all 

10 components* 

8. An electronic scheduler as recited in claim 
1, further comprising a read only memory (ROM) storing 
firmware, application software, screen message data and 
prerecorded voice message data, 

9. An electronic scheduler as recited in claim 
1, further comprising means for grouping entered 
information into data groups including name, telephone 
number, address, and message; and search logic means for 
retrieving from memory all said stored information in said 
data groups when only a portion of said information is 
provided. 

10 • An electronic scheduler as recited in claim 
9, further comprising: 

(i) speech patterning means for extracting 
2 5 identifying parameters from digital audio data; 

(j) speech recognition means for comparing the 
extracted identifying parameters to each of a group of 
reference identifying parameters associated with a first 
reference vocabulary, and producing a match indication as a 
30 function of said comparing; 

(k) command logic means coupled' to said speech 
recognition means to effect the performance of 
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predetermined functions of the electronic scheduler upon 
receiving said match indication; and 

(1) interactive speech control means for 
controlling the interaction of said command logic means 
5 with said voice synthesis means such that said voice 
synthesis means synthesizes prompts indicating when and 
which speech commands are to be input. 

11. An electronic scheduler as recited in claim 

10, wherein said first reference vocabulary is either 
factory installed and speaker- independent or is created 
through a training process with spoken utterances or sounds 
by extracting from said utterances or sounds reference 
identifying parameters and storing said reference 
identifying parameters as the group of reference 
identifying parameters for the first reference vocabulary. 

12 • An electronic scheduler as recited in claim 

11, wherein said speech recognition means comprises means 
for producing a nonmatch indication when said match 
indication does Jiot result from the input of a given spoken 

20 utterance, said nonmatch indication indicating that said 
given spoken utterance was not recognized. 

13 . An electronic scheduler as recited in claim 
10, wherein said predetermined functions include: turning 
on and off the electronic scheduler, retrieving specified 

25 stored information, setting the alarm time associated with 
a particular recorded message, entering name and telephone 
number, and setting a secret code for limited data access, 

14 . An electronic scheduler as recited in claim 
10, further comprising: 

30 (m) a second reference vocabulary containing 

time information for use by said speech recognition means 
to extract time information from the extracted identifying 
parameters ; 
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(n) alarm time logic means coupled to said 
speech recognition means to allow entry of an alarm time 
upon said speech recognition means producing a match 

indication; and 

(o) first interactive speech recording means for 
controlling the interaction of said alarm time logic means 
with said voice synthesis means, whereby said voice 
synthesis means synthesizes speech or sound prompts for 
indicating the reguired delivery time of alarm time. 

15. An electronic scheduler as recited in claim 

10, further comprising: 

(p) a third reference vocabulary containing name 
and telephone number information for use by said speech 
recognition means to extract name and telephone number 
information from the extracted identifying parameters; 

(q) name and telephone number logic means 
coupled to said speech recognition means to allow entry of 
an name and telephone number including time of day and date 
upon said speech recognition means producing a match 

> indication; and m 

(r) second interactive speech recording means 
for controlling the interaction of said name and telephone 
number logiq means with said voice synthesis means, whereby 
said voice synthesis means synthesizes speech or sound 
25 prompts for indicating the required delivery time of name 
and telephone number input. 

16. An electronic scheduler as recited in claim 
10, further comprising means for audibly confirming 
information entered by said speech recognition means. 

0 17. An electronic scheduler as recited in claim 

1 further comprising a personal computer link to external 
computer or memory for archiving information stored in said 
internal device memory or transferring data into or out of 
the electronic scheduler. 
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18. An electronic scheduler as recited in claim 
1, further comprising secret option means for entering 
protected information for limited access. 

19. An electronic scheduler as recited in claim 
5 1, further comprising auto dialing means for producing a 

series of audible tones in appropriate frequency ranges to 
dial a recorded or typed in telephone number. 

20. A method for using speech to operate an 
electronic device, comprising: 

10 (a) a step for receiving spoken utterances 

as audio data and converting said utterances into digital 
spoken utterance data; 

(b) speech patterning step whereby 
identifying parameters are extracted from said digital 

15 spoken utterance data; 

(c) speech recognition step; comparing said 
extracted identifying parameters to each of a group of 
reference identifying parameters associated with a 
reference vocabulary, and producing a match indication as a 

20 function of said comparing; each word in said reference 

vocabulary, containing words for device functional command 
information, obtains an associated group of identifying 
parameters by forming speech patterns by speech patterning 
step; 

25 (d) command function execution step for 

performing predetermined functions of the device upon 
receiving said match indication from speech recognition 
step ; 

(e) audible speech synthesis step for 
30 producing intended words or sounds upon retrieving from 

memory, expanding, D/A converting, filtering and amplifying 
for listening by the user; and 

(f) interactive device control step for 
audibly delivering speech or sound prompts generated 

35 through voice synthesizing step, indicating required 
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delivery time of speech ccnBands relevant at each moment of 
operation, and then using said selected commands to perform 
cne of a group of predetermined functions. 

21 A method for using speech to operate an 
electronic device as recited in claim 20, wherein said 
predetermined functions include: turning on °« * h * 
electronic device, retrieving specified stored xnformatxon. 
setting the alarm time associated with a P-txcular 
recorded message, searching for stored xnformatxon by 
providing only a portion of said stored information and 
setting a secret code for limited data access. 

22. A method for using speech to operate an 
electronic device as recited in claim 20, further 
comprising interactive speech recording step for 
15 controlling the interaction of said command *«"<=" on 

execution step with said means for svnthesxsing audxble 
speech or sound whereby said synthesis means audxbly 
delivers speech-like or sound prompts for the purpose of 
tndictting the reguired delivery time of said audio message 

20 input. 

23, A method for using speech to operate an 
electronic device as recited in claim 20, wherein saxd 
synthesizing step audibly confirms information entered by 
said speech recognition step. 
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